This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the 
original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 

IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problems Mailbox. 



This Page Blank (uspto) 



1 



JAN 19 20013 15:00 |-R IBM, hRLINGIUN 



V'UJp 41c!. ^4^1 lU bHNlH ItKbbH 



WORLD INTOLLECTUAL PROPERTV OROAKIZATION 
International Bureau 




PCX 

INTBINATIQNAL APPUCATTON PUBUSHED UNDER THE PATENT COOPERATION TRE ATY (PCD 

WO 99/32982 

i July 1999 (Ot . 07.99) 



(51) Internattonal Patent Clasalflcstlon ^ 

G06F 13/00, 17/21, 3/00, G06T IVOO 



Al 



(11) Interoatlonal Publlcatioii Kumber: 
(43) International PubUcatton Date; 



(21) IntertmtioziQ] Applicatfon Number: PCr/US9S/27322 
(U) International FOing Date; 21 December 1998 (21.12.98) 



(30) Priority Data: 
08W7^ 



23 December 1997 (23J2.97) US 



(63) Related by Continuation (CON) or Contlnuation-in-Part 
(CIF) to Earlier ApjiUcation 

(»/997»209 (CIP) 

Httcd on 23 December 1997 (23. 12.97) 



(71) Applicant (for all d^signmed Stales except US): ADOBE 

SYSTEMS INCORPORATED {US/US]; 345 Park Avenue. 
San lose, CA 95 11 0-2704 (US), 

(72) Inyetitors; and 

(75) Inventors/Applicanifi (far US only): RAMAN. V. [IN/US]; 
900 High School Way #2326» Mountain View, OA 94041 
(US). CARO, Pcny, A, [USrtJSJ; 1269 GItn Haven Drivc> 
San Jose, CA 95129 (US). 

(74) Agent: GARCIA. Edouaid, A.; Pish & Richaidson P.C, Suite 
100. 2200 Sand Hill Road, Menlo Park, CA 94025 (US), 



(81) Designated SUtes: CA. CN, JP. US. European patent (AT, BE 
NL, PT, Sp). 



Published 

international search report. 
Before the expiration <4 the time limit for amending the 
claims end to be republished in the event of the receipt of 
amendment, ^ 



(54) Title: DESCRIBING DOCUMENTS AND EXPRESSING DOCUMENT STRUCTURE 



^ 18t 



!5r 



CUont 



112 



AppUqatlen 
PrDgram 



Expratalon 



ServAr 



M^thoda 



\ 



Its 



Disk Storage 

Documant FBa ~[ 



Miti-Oata 

Qumma/yFde 



Client 



16) 



Apptlcatlon 



Stnictumi 
Exprattion 



16a 



(57) Abstract 

Apparatus and methods of revealing the hieraidiical stnjcture of a rfArnmAHi h«»u^ . , . 
de«^ (100). TTie hlerarehical stnictniS may be^^^^S^l^ « ^aractcristic type of content are 

node (126), A aemantic representation for imexpitSSg^irtt^ "i^^ ^ ^^'^ more 

encapsnlate.rnKturaI and mete infomiauonassod^j^^ ^ P"^^^^' ^"^^^^ description filw aie used to 

fi ^ external to naUvc application flies and ha^a^t of ~ I^^ttT^I^^^^ ^^=^'>' ^'""^^ ^^P^^^ 
data usmg unlfomi resource locator (URLS) and wrve as virmSl d^cnt. <i«cnptJon files pent to the referenced document 

encode additional atmctural infomiaUon in the doam^t d^SoT^^ °" applications can choose to 



JPN 19 2000 15:00 FR IBfl. ARLINGTON 



CO ^i<:i ^^ol ID bHNIH I CiKtbH 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify Statw party to the PCT on the front pa^es of pamphlets publishing international applicadons under the PCT. 



AL 


ARuflu 




AM 


AffiMnii 


PI 


AT 


Amtrit 


fers 


AU 


AwtrilU 


GA 


A2 


AXfifb^JlD 


CB 


AA 


Bo«nia and Henesovin& 


cs 


BB 




GH 




B<!l|iure 


ON 






OR 


ttC 




HU 


BJ 


Benin 


IE 


BR 


Bnzll 


Ih 


BY 




IS 


CA 




IT 


CF 


Centrtl Afrfcan Republic 


JP 


CG 




K£ 


CH 


SwilxCftAlU) 


KG 


CI 


Cfi<e d'rvftli? 


KP 


CM 






CN 




KR 


CU 


Cuba 


KZ 


cz 


Czech Republic 


LC 


0& 


Genauny 


U 


DK 




uc 


EB 


Ettooii 


LR 



Spun 
FlntaAd 

Gibofl 

Ufiliea Kingdom 

OcvTfia 

Ghana 

Oalnea 

Greece 

Inland 
TiriBl 
Tcoland 
IiiJy 

Danocntfc E*Qcp1c'i 
Rqmbllc of Korea 
Re^lic of Kor^ 
Kaxakftan 
Satrtt Luda 
ticchunvteln 

Uberit 



LS 
LT 
LU 

tv 

MC 
MD 
MG 
MK 

ML 

MN 

MR 

MW 

MX 

NE 

NL 

NO 

NZ 

PL 

PT 

RO 

RU 

8D 

SB 

SG 



UscMho 

LUhtunia 

Luxetnboui!} 

Litvla 

Monaco 

Repoblic ofKMdova 

Madigt«car 

The tdrmer Yugoslav 

Republic of M4C£^Ia 

MOQ^Iia 

Malawi 

Mexico 

NethcrtarKli 
NofWiy 

Zealand 
Poind 
Panugal 
Romania 

Ruttlan Pcdombn 
Sudan 
Swaden 
Singiporc 



51 




SK 


Slovdcla 


SN 




sz 


Swaziland 


TD 


Chad 


TO 




TJ 




TM 


Turisn^TffUui 


TR 


TUftoy 


TT 


Trntidad and Tobojo 


UA 


Ulcnine 


UG 


li&Anda 




United SmtDS or Am6ri<:a 


uz 


Uzbekiitaji 


VN 


ViM Nam 


YU 


Yugwlavla 


zw 


Zimbabwe 



J HIS ^liJtJtJ ID-liii rK ibn» HKL lINUiUri 



^^^''^"'^ PCT/US98/27322 



Structure of the document, expressing, independently of the document content type, 
the hierarchical structure of the document as a tree structure of one or more nodes; and 
providing a semantic representation for interpreting the tree structure. 

The client request may comprise a request for information relating to the 
5 position of one or more nodes within the tree structure. The expressing step may 

comprise expressing the hierarchical structure of the document at a level of detail 
specified in the client request. The expressing step may comprise associating with a 
given node an attribute indicating the relative detail level represented by the given 
node. In response to a client request for suuctural information about the given node at 
1 0 a level of detail that is different from the level of detail indicated by the attribute 

associated with the given node, the hierarchical structure of the document may be 
expressed, indepcndendy of document content type, as a tree structure of one or more 
nodes, including the given node, at the detail level specified in the client request. 

The expressing step may comprise associating with a given node an attribute 
1 5 identifying a second semantic representation for the structural feature of the document 

represented by the given node, the second semantic interpretation being different from 
the first semantic interpretation In response to a client request, the second semantic 
representation for interpreting the given node may be provided, In response to a client 
request, access to document content may be provided based on the second semantic 
20 representation. 

In yet another aspect, the invention features a document description file, stored 
on a computer-readable medium, for describing the hierarchical structure of a 
document having content of a characteristic type of content. The document 
description file comprising: a tree structure of one or more nodes expressing, 
25 independently of the document content type, the hierarchical structure of the 

document; and a semantic representation for interpreting the tree structure. 

An attribute may be associated with each tree node that describes the semantic 
character of the associated tree node, The semantic representation may be based upon 
the document content type. The semanUc representation may be independent of the 
document content. The semandc representation may define parent-child relationships 
among die nodes, A child-count attribute may be associated with a node that is 
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indicative of whether the node has associated child nodes that are not yet expressed in 
the tree structure. 

A second document description file may be provided comprising: a second 
tree structure of one or more nodes expressing, independently of the document content 
type, the hierarchical structure of the document; and a second semantic representation 
which is different fcom the first semantic representation for interpreting the second 
tree structure. 

An attribute may be associated with a given node identifying a second 
semantic interpretation for the structural feature of the document represented by the 
given node, the second semantic interpretation being different from the first semantic 
interpretation. 

In another aspect the invention features a document description file, stored on 
a computer-readable medium, for describing the hierarchical structure of a document 
having content of a characteristic type of content, comprising: a tree structure of one 
15 or more nodes expressing, independently of the document content type, the 

hierarchical structure of the document; a semantic representation for interpreting the 
tree structure; and information relating to document content within the hierarchical 
structure expressed by one or more tree nodes produced in response to a client request 
for document content associated with one or more tree nodes. 
20 The information relating to document content may comprise a pointer to the 

requested document content The information relating to document content may 
comprise the requested document content. 

The invention also features a document description file, stored on a computer- 
readable medium, for describing the hierarchical structure of a document having 
content of a characteristic type of content comprising: a tree structure of one or more 
nodes expressing, independently of the document content type, the hierarchical 
structure of the document the tree smicture being produced in response to a client 
request for information relating to the hierarchical structure of the document; and a 
semantic representation for interpreting the tree structure. 

The tree structure may express the hierarchical structure of the document at a 
level of detail specified in the client request. An attribute may be associated with a 
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given node indicating the relative detail level represented by the given node, A tree 
structure of one or more nodes, including the given node, may be provided thai 
expresses, independently of document content type, the hierarchical strucmre of the 
document at a detail level specified in a client request for structural information about 
5 the given node, the requested detail level being different from the level of detail 

indicated by the attribute associated with the given node. An attribute may be 
associated with a given node identifying a second semantic interpretation for the 
structural feature of the document represented, by the given node, the second semantic 
interpretation being different from the first semantic interpretation. 
1 0 In another aspect, the invention features a method executed on a computer for 

generating a first document description file for describing a document stored on a 
computer'-readable medium. The method generates a description of an application 
which produced the document, generates a description of a location from which the 
document can be obtained, and generates a description of an operation that can be 
1 5 performed on the document to produce a second document description file. The 

description of the location may be a uniform resource locator. The uniform resource 
locator may identify a server configured to produce the document upon request. The 
uniform resource locator may identify a location at which the document is stored. The 
content of the first document description file and the content of the second document 
20 description file may be represented in XML syntax. 

The operation may be a transformation of the document tom a file stored in a 
furst storage format to a file stored in a second storage format, and the second 
document description file describes the file stored in the second storage format. The 
second document may describe the first document description file. The operation may 
25 extract information fi-om the document, and the second document description file 

describes the information extracted from the document. The second document 
description file may describe the first document description file. The information 
extracted fi-om the document may describe a range of pages of the document. The 
document may represent a multi-layered graphical objea, and the information 
30 extracted from the document describes a subset of the layers of the mulU-layered 

graphical object. 
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The method of may generate application-specific data describing the 
document. The application-specific data may be a name of an application that 
produced the document. The application-specific data may be a version number of an 
application that produced the document 
5 The method may generate a field containing information describing the 

docmnent. The field may be an HTTP header. The field may describe a date on 
which the document was produced. The field may describe a date on which the 
document was modified. The field may describe a size of the document. The field 
may describe content contained in the document. 

Ii^ another aspect, the invention features a method for processing a request 
document description file stored on a computer-readable medium, the request 
document description file describing a source document and an operation to be 
perfonned on the source document, The request document description file is received 
from a client, the source document is retrieved, the operation is applied to the source 
1 5 document to produce information derived from the source document, and a response 

document description file is generated containing a description of the information 
derived from the source document. 

The information derived from the source document may be a resuh document. 
The response document description file may be a pointer to the result document, The 
20 description of the information derived from the source document may be the result 

document. The description of the information derived from the source document may 
be a pointer to the result document. The pointer may be a uniform resource locator. 
.The response document description file may bs generated by generating a description 
of the source document. The description of the source document may be the source 
25 document, Tlxe description of the source document may be a pointer to the source 

document. The pointer may be a uniform resource locator. The response document 
may be transmitted to the client. The information derived from the source document 
may be transmitted to the client. 

In anotiier aspect, the invention features a document description file, stored on 
a computer-readable medium, for describing a document stored on a computer- 
readable medium, the documem description file containing a description of an 
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application program which produced the document, a description of a location from 
which the document can be obtained, and a description of an operation that can be 
performed on the document to produce a second document description file. The 
operation that can be performed on the document may be a transformation of the 
5 document from a file stored in a first storage format to a file stored in a second storage 

fonnat, and the second document description file describes the file stored in the 
second storage forniat. The operation that can be performed on the document may be 
extraction of information from the document, and the second document description 
file describes the information extracted from the document. The document 
1 0 description file may contain a description of an operation to be perfonned on the 

document. 

A document description format (DDF) file encapsulates the location of a 
document along with usefiil descriptive information about the document. This 
enables authoring applications to capture and export information about content 
1 5 without requiring changes to current authoring application file formats. A DDF can 

be used as a virtual document to capture as much or as little information about data 
contained in a native authoring application file as is desired by the application and/or 
user. 

Among the advantages of the invention are one or more of the following: 
One advantage of the invention is that the content of a document description 
format (DDF) file is independent of the authoring application used to produce the file 
described by the DDF. DifiFerent authoring applications can therefore use DDF files to 
cooperatively manipulate, synthesize, and exchange document data. Although DDFs 
are independent of application-specific data, application-specific data can optionally 
25 be encapsulated within a DDF in order to optimize certain operations. 

Another advantage of the invention is that the size of a DDF file is typically 
much smaller than the document which it describes. A typical DDF file is a few 
hundred bytes long. This aspect of the invention is particularly advantageous when 
used in conjunction with files, such as multimedia files, which are typically very large. 
Because the size of a DDF file is independent of the size of the file described by the 
DDF, the size of a DDF file will typically not increase if the size of the document 
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described by the DDF mcreascs. Local storage of DDF flies instead of native files can 
therefore result in significant storage savings. 

Another advantage of the invention is thai, as a result of the small size of DDF 
files, exchange of DDF files is more efficient than exchange of the native files 
5 described by the DDFs, Many user-level manipulation tasks involve retrieving a 

diverse set of data items, assembling them, filtering out certain data items, and 
eventually creating a data aggregation consisting of the data items of interest. The 
intermediate steps of such operations can be perfonned more efficiently by using 
DDFs than by using native files, because of the small size of DDF files and the 
1 0 selective encapsulation of structural information and meta-data provided by DDFs. 

Another advantage of the invention is that DDF-aware clients, servers, and 
applications use late binding, i.e., a reference within a DDF to a native file is not 
bound to the content of the native file until it is actually necessary to access such 
content, such as when the file is to be printed. Use of late binding reduces the number 
1 5 of temporary files that are produced when performing a series of operations on a file, 

thereby increasing the efficiency of such operations. 

Another advantage of the invention is Aat implementing use of the document 
description format requires minimal modifications to existing application programs. 
A simple plug-in to an existing application program can be used to enable the 
20 application program to save descriptions of existing documents. DDF client software 

running on a client workstation manipulates DDF files and handles client-server DDF 
transactions without requiring any modification to existing application programs. 

Another advantage of the invention is that it makes more efficient use of client 
resources. For many transactions, clients need only store and exchange DDF files 
which are typically much smaller than the application-specific files to which they 
correspond. Furthermore, because all processing of application-specific files is 
performed by application-specific servers, the number of authoring applications that 
clients need to store and execute is reduced, Furthermore, application-specific servers 
can be optimized to process files produced by specific appUcations, thereby increasing 
30 processing efficiency. 
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Other features and advantages of the invention will become apparent from the 
following description, including the drawings and the claims. 

Brief Description of the Drawintr^ 
Fig. 1 A is a schematic view of a document 

Fig. IB is a schematic view of a tree structure expression of the structure 
inherent in the document of Fig. 1 A. 

Fig» IC is a schematic view of the tree snucture of Fig. IB and a semantic 
representation for interpreting the tree structure. 

Fig, 1 D is a schematic view of the tree structure and semantic representation of 
Fig. IC expressed at a level of detail that is less than the level of detail expressed in 
Fig. IC. 

Fig, 2 is a block diagram of a network configured to reveal document 
structure. 

^ ^ Fig. 3 is a flow diagram of a method of revealing document structure 

Fig. 4 is a block diagram of a network and computer hardware and software 
configured to manipulate document description format files. 

Fig. 5 is a block diagram of the communications that take place among a 
subset of the elements shown in Fig. 4. 

2^ ^ig- 6 is a flowchart of a method for applying a transformation to a document 

description format file. 

Fig. 7 illustrates a computer and computer elements suitable for implementing 
the invention. 

Detailed Description 
The following section describes the general features of a method for 
expressing the structure of a document. This section is followed by a description of a 
method for describing documents using a novel document description format (DDF) 
and a method for expressing document structure within the context of that format. 
General Features of a Srhf me for Rev? n li p g Document Stnirt^r^ - 
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Referring to Fig. lA, a document 100 (e.g., an article created by the Adobe® 
FrameMaker® authoring program) has an inherent nested (or hierarchical) structure. 
Document 1 00 is composed of a general heading 102 which is nested within an article 
104. Sections 106, 108 are nested within general heading 102. Paragraphs 1 10 and 
5 112 are nested within a sub-headhig 1 14 which, in turn, is nested within section 1 06. 

Paragraph 1 16 and graphic 1 1 8 are nested within a sub^heading 120 which, in turn, is 
nested within section 108. Graphic 118 is composed of nested layers 122, 124. 

As shown in Fig. IB, the hierarchical structure of document 1 00 may be 
expressed independently of the document content type as a tree structure 126 of one or 
1 0 more nodes. The hierarchical structure of Document 1 00 is expressed by the parent- 

child relationships established in tree structure 126. Thus, nodes 2. 1 1 and 2. 12 
depend from, or arc the "children'* of, Node_2.1 0. Similarly, nodes 2.21 and 2.22 are 
the children of Node_2^0, and nodes 2.1 and 2.2 are the children of Node^MO. 
Node_2.22 is the parent node for nodes 2.221 and 2.222. The parent-child 
1 5 relationships defined by the nodes of tree stmcture 126 are expressed independently of 

the type of content in document 100 - i.e., the tree smicture encapsulates only the 
hierarchical structtu^e of document 100, not its content. For example, nodes 2.221 
2.222> which correspond to layers 122, 124 of graphic 1 18, are expressed as the 
children of Node_2.22 in the same way as Node_2. 1 1 and Node_^2. 12, which 
20 represent paragraphs 1 10, 1 12, are expressed as the children of Node_2.1 0. 

The features of tree structure 126 are controlled by a semantic representation 
which attaches a meaning to each of the nesting levels within the tree structure. The 
meaning of the nesting levels will generally vary with the type of content in document 
100. For example, documents created by the Adobe® FrameMaker® authoring 
program are likely to have chapters and sections, whereas documents created by the 
Adobe® Photoshop® authoring program or the Adobe® Illustrator® authoring 
program ar^ likely to be composed of layers. The semantic representation therefore 
provides an interpretation of the tree structure of parent-child nodes. Different 
semantic representations may be used to express the smicture of the same document 
instance. In other words, the hierarchical stnicuire of a particular document may be 
expressed by more than one tree structure and associated semantic representation. For 
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example, the hiemchical structure of a document created by the Adobe® 
FrameMaker® authoring program also may be expressed in terms of chapters and 
sections or, alternatively, in terms of pages each of which has its own independent 
hierarchical structure. 

A particular expression of the hierarchical structure of a document may be 
interpreted by associating name attributes (or "labels") with each of tlie tree nodes in 
accordance wth a suitable naming model (or "namespace") which is, at least in part» 
selected based upon the type of content contained within the document. For example, 
a suitable namespace for document 100 may include the attributes identified in Table 



Attribute Label 


Attribute Meaning 


DIVISION 


A sequence of SECTIONS. 


SECTION 


A sequence of PARAGRAPHS and 
FIGURES 


PARAGRAPH 


Text content 


HEADING 


Text content identifying DIVISIONS & 
SECTIONS 


FIGURE 


A sequence of LAYERS 


1 LAYER 


Graphic content 



20 



25 



Table 1 : A semantic representation for a document created by the 
Adobe® FrameMaker authoring program. 

As shown in Fig. IC, an expression 1 28 of the hierarchical structure of 
document 100 includes a tree structure of one or more nodes and a set of namespace 
attributes which are associated with the tree nodes and identify the semantics of the 
structural features of document 100 that are represented by the n-ee nodes. For 
example, expression 128 includes NodeJ.lO and an associated HEADING attribute, 
which together express the structure of section 106 of document 100 by the position of 
Node_2.10 in the tree structure and the meaning associated with the attribute 
HEADING.. 
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Expression 128 represents a generic way to encode hierarchical structures and • 
their associated meta-data. Each node in this structure reveals its position in the 
logical structure through an associated namespace attribute. In the embodiment 
presented in Table 2 below, a suitable namespace includes several models that contain 
document content classifications which express different levels of detail. The models 
follow a simple inheritance hierarchy whereby every model is also a member of its 
parent model. For example, RasterGfitCModel is also a GfxCModel and a CModel; 
that is, everything that can be classified with the RasterGfxCModel can also be 
classified with the G6cCModel and the CModel. This inheritance scheme allows 
structural expression applications to express the hierarchical structure of a document 
with a relatively specific level of detail, while allowing clients reading the structure to 
deal with structural elements more generally. For example, a ftjll-text search engine 
might not need to distinguish between a HEADING and a PARAGRAPH in a 
document structure expressed using the FlowTxtCModel In such a situation, the 
search engine may simply treat all classifications as if they were a classification (e.g., 
TEXT) in the parent model TxtCModel 




Koot All 
Model Content 
Models 



Specific Content Models 



Classification 



Model: 
root of all 
models 



ELEMENT: an element 



DOCUMENT: a structured document 



PORTFOLIO: collection of structured 
elements 



REFERENCE; indirect reference to an element 
in another Portfolio 



UNKNOWN: element of unknown 
classification 



CModel: 
all content 
models 



UNBOUND: for content that is explicitly 
unspecified 



CONTENT: all sub-models are content models 



UNIT: collection of content elements treated as 
a single unit 
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Root 
Model 


AU 
Content 
Models 


Specific Content Models 


Classification 






DynCModel: dynamic 
media content 


CLIP: a lirae-based sequence 
OVERLAY: overlay 
TIMEBASE: time-base 







rvu ulu t^j iTiu , 

audio content 


v^n/vrirtrtij. a stream or audio data 






VldfiADvnPMnrlrl* 

video content 


TRACK: a stream of video data 






CfxCModel: graphics 
content 


FIGURE: graphics content 
TEXTURE: texture 






DocGDcCModel: final 
form document content 


PAGE: a printable page 






L^yeredGfiCModel: 
layered graphics content 


LAYER; a layer 






MultiGfxCWodel: multi- 
model graphics 


ARTWORK: an artwork view 
PREVIEW: preview 
WIREFRAME: wireframe view 






RasterGfkCmodel: raster 
graphics content 


COLORCH AN: a color channel 
RASTER: raster gmphic (bitmap) 






TriDGfyCmodeI;3D 
graphics content 


LIGHT: a light source 
POLYGON: polygon In 3*space 
TRIDMODEL; 3D model 
VERTEX: vertex in 3-space 






VcctorGfijClVIodel: 
vector graphics content 


CURVE: curve 
LINE: line 
PATH: path 
POINT: point 






UnkCModel: hyperlink 
content 


LINK: a link 






TxtCModel: text content 


TEXT: any kind of human-readable text 
TCODE: encoded data, such as JavaScript 
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Root 
Model 



Table 2 



All 
Content 
Models 



Specific Content ModeU 



Classification 



FlowTxtCModel: 
flowing/structured texi 
content 



PlacedTztCModel: 
decorBtive/unstructured 
text content 



ARTICLE: a sequence of sections, with 
optional heading 

FOOTNOTE: footnote 

HEADING: an article, section, or list 

LABEL: text label 

LIST: list of items, with an optional heading 

LIST_ITEM: item in a list 

PARA: paragraph, a unit of text 

SECTION: logical division, with an optional 
heading 



ART_TEXT: text as art. such as letter fonps 
PRE: pre-fonnatted text 



Namespace models. The model names and classifications are 
marked in bold-face font and are followed by a brief description 
of their characteristics. 



Other namespaces are possible. For example. Tables 3-7 below contain 
exemplary semantic representations for sheet music, e simple text document, a 
COBOL program, and poetry. Tables 4 (simple text document) and 5 (COBOL 
program) contain parallel semantic representations; although they have the same 
attribute labels, the semantics are very different. Tables 6 and 7 contain semantic 
representations that have different vocabularies for the same content type (i.e., poetry). 
A server could use either semantic representation to expose the strucnire of a poem. 
In some embodiments, a client may select the semantic representation used to 
decompose the structure of a poem. 
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AuriDUIC LaDCl 


Attnbute Meaning 




An entire work of music 


PART 


The part for a particular instrument 


STAFF 


A sequence of MEASURES 


REPEAT 


A subsection of a STAFF to be repeated 


CODA 


A continuation section of a STAFF 


MEASURE 


A sequence of NOTES 


NOTE 


A representation of pitch (including silence) 
and duration 



Table 3 : A semantic representation for sheet music. 
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Attribute Label 


Attribute Meanin|^ 


DIVISION 


A sequence of SECTIONS. 


SECTION 


A sequence of PARAGRAPHS 


PARAGRAPH 


A sequence of SENTENCES 


SENTENCE 


Text content 



Table 4; A semantic representation for a simple text document. 
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Attribute Label 


Attribute Meaning | 


DIVISION 


A sequence of SECTIONS. 


SECTION 


A sequence of PARAGRAPH 


PARAGRAPH 


A sequence of SENTENCES 


SENTENCE 


Code verbs and statements 



Table 5; A semantic representation for a COBOL software 
program. 
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Attribute Label 


Attribute Meaninp 


BOOK 


A sequence of CANTOS. 


CANTO 


A sequence of VERSES 


VERSE 


A line of poetic text content 


Table 6: A semantic represedtation for poetry. 


Attribute Label 


Attribute Meaning 


VOLUME 


A sequence of POEMS. 


POEM 


A sequence of PASSAGES 


PASSAGE 


A sequence of LINES 


LINE 


A verse of poetic text content 



Table 7: Another semantic representation for poetry. 

1 ^ A tree structure encodes a specific structural decomposition of a document 

instance. Where multiple orthogonal structural decompositions exist, each 
decomposition can be expressed by a different synchronous, tree structure. For 
example, a picture can be decomposed into LAYERS, each of which can also play the 
role of a RASTER- Each synchronous tree structure can be represented by a separate 

20 structural decomposition. Alternatively, a node in a tree structure may include a 

ROLE attribute which identifies the various other roles played by the structural feature 
represented by that node. Attribute ROLE provides alternative interpretations of the 
structural feature corresponding to a given node. Referring to Fig. IC, layers 122, 124 
of Figure 2. 1 each play the role of a RASTER in addition to its role as a LAYER, as 

25 indicated by the ROLE attributes associated with Node_2.221 and Node_2.222. The 

client may use the ROLE attribute in many ways. For example, a client may use the 
ROLE attribute to specify the foiin in which content corresponding to a particular 
node should be extracted from a document A client may also use the ROLE attribute 
to enumerate certain structural features in a document (e.g., the number of chapters in 

30 a document). 
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As much or as little of the structure of a document may be exposed as is 
needed. A CHILD-COUNT node attribute is associated with each node to indicate the 
relative level of detail that is expressed by a particular tree structure. In one 
implementation, a non-zero CHILD-COUNT value indicates that there is additional 
structure which is not revealed in a given tree structure. As shovwi in Fig. ID, 
Node_2.l0 and Node_2.20 each has an associated CHILD-COUNT value of 2, 
indicating that each of these nodes is the parent of two child nodes which are not 
revealed in structural expression 1 34. In particular, nodes 2.1 1 and 2.12, which are 
the children of Node_2.10, and nodes 2.21 and 2.22, which are the children of 
Node_2.20, are not revealed in structural expression 134. Applications that provide 
access to such structural expressions may invoke a method which transfomis a given 
structural expression into a new expression which exposes the tree structure to the 
desired level of detail. 

In addition to expressing structure through a tree structure and an associated 
1 5 semantic representation, the structural expression also provides access to document 

content corresponding to one or more nodes in the tree structure (e.g., to extract the 
text of paragraph 1 10). The access to document content may be incorporated into the 
file containing the structural expression either by value or by reference. Incorporation 
by value involves copymg the desired document content into the structural expression 
20 file. Incorporation by reference involves placing within the structural expression file a 

URL (uniform resource locator) that points to the desired document content. 

Referring to Fig. 2, in a client-server implementation, a client workstation 150 
running an authoring application 152, such as a word processing authoring application 
or a graphics authoring application, may produce a document having content of a 
25 characteristic type, and may save the documem in a file 1 54 on a disk 1 56. Client 1 50 

may access the content of document file 154 by instructing authoring application 152 
to load document file 154 from disk 156 into the memory of client 1 50, where a user 
may access or modify the content of document file 1 54 with authoring application 
152. 

A user may also instruct authoring application 152 to store on disk 1 56 an 
expression of the hierarchical suvcture of document file 154. Authoring application 
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152 may invoke a structural expression plug-in 157, residing on client workstation 
1 50, to create a meta-data summary file 1 58 which encodes the hierarchical structure 
and associated meta-data of the document. Authoring application 152 exposes as 
much (or as little) of the hierarchical structure of document file 150 as is appropriate 
5 for a specific document instance. Structural expression plug-in 1 57 indicates when 

additional structural information may be revealed upon client request by associating a 
non-zero CHILD-COUNT attribute with those nodes that have one or more children 
which are not expressed in the current meta-data summary file. Structural expression 
plug-in 1 57 enables authoring application 152 to save and retrieve meta-data summary 
1 0 files without requiring any modifications to authoring application 1 52. Structural 

expression plug-in 157 also allows a user to specify a position in the tree structure 
revealed in meta-data summaiy file 158 and a desired expression depth. In response 
to such a specification, structural expression plug-in 157 transforms meta-data 
summaiy fde 1 58 into a new summary file that exposes the tree structure to the 
1 5 desired level of detail. 

In an alternative embodiment, stmctural expression plug-in 1 57 may save 
information to and retrieve information from transitory data structures in the memory 
of client 1 50 ra4er than storing meta-data summary files on disk 1 56. A separate 
application or script may provide a batch operation to produce meta-data summary 
20 files corresponding to pre-existing document files. Also, rather than using structural 

expression plug-in 1 57, authoring application 1 52 may be modified to enable a user to 
save and retrieve meta-data summary files directly. 

Another client 151 may access infomiation relating to the hierarchical 
structure of document file 154 by invoking one or more methods 1 59 residing on 
server 160. In particular, client 150 sends to server 160 a request 1 62 for information 
about the hierarchical saiicture of document file 154. As discussed above, request 
162 may specify the level of detail at which the structural information should be 
presented. In response, server 160 invokes one or more methods 159 to retrieve or 
otherwise manipulate meta-data summary file 1 58. Sen-er 1 60 may send to client 1 5 1 
a response 166 that contains information about meta-data summaiy file 158. 
Response 166 may encapsulate the entire meta-data summary file, or the response may 
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simply contain a «ference to the storage location of the meta-data summary file on 
diskl56. Client 151 may access meta-data summary file 158 through authoring 
application program 153 or through a web browser program residing on cliem 
workstation 151. 

5 In addition to being implemented in a client^server environment, the invention 

may be implemented in a stand-alone enviromnent. For example, methods 159 may 
reside in client workstation 150, and document file 154 and meta-data summary file 
1 58 may be stored in memory located in cUent woricstation 1 50. 

Referring to Fig. 3, in a method of describing the hierarchical structure of a 

1 0 document, a client may access information relating to the hierarchical strucnire of the 

document as follows. In response to a client request (step 170), the hierarchical 
structure of the document is expressed, independently of document content type, as a 
tree structure of one or more nodes and an associated semantic representation (step 
172)- If the initial, or a subsequent, client request specifies the level of detail at which 

1 5 the hierarchical structure of the document should be expressed, either for a particular 

node or for the entire tree structure (step 174), the structural expression is transformed 
into a new expression of the document strucnire at the requested level of detail (step 
1 76). If the client request for information relates to a second semantic role played by 
a particular node (step 178), the hierarchical structure of the document is then 

20 expressed, independently of document content type, as a second uce structure of one 

or more nodes that is controlled by a semantic representation corresponding to the 
second semantic role played by that node (step 1 80). If the client request relates to 
docxmient content corresponding to a particular node (step 182), the client is provided 
access to the requested content, either by value or by reference (step 1 84). 

25 A Document Descrintion Format: 

A Document Description Format (DDF) is a method for describing a 
document using a mark-up language that identifies the components of individual 
native files that are included in a document by reference to the descriptive information 
(e.g., source location, portions of the source to be included, and mime type) of the 

30 compiled document. 
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Referring to Fig. 4, an authoring application APPl 12, such as a word 
processmg application or a graphics application, resides on a client workstation 1 0. A 
user uses the authoring application 12 to produce a document having raw information 
content, and to save the content on a local disk 16 in a file 1 8 formatted in a native file 
5 format of the authoring application 12. If the user desires to access the native APPl 

file 18, the user instructs the authoring application 12 to load the native file 18 fi-om 
the disk 16 into the memory of the client workstation 10. The user may then 
manipulate the file 18 in the memory of the client workstation 10 using the authoring 
application 12. 

^ 0 Also residing on the client workstation 20 is a document description format 

(DDF) plug-in 14 to the authoring application 12. A DDF file, described in more 
detail below, provides an application-independent description of a document saved in 
a native file format of an authoring application program, The DDF plug-in 14 
provides the authoring application 12 with the ability to save and retrieve DDF files 
15 without requiring any modifications to the authoring application 12. Alternatively, the 

DDF plug-in 14 may save information to and retrieve information from transitory data 
structures in the memory of the client 10, rather than using DDF files. A separate 
application or script (not shown) provides a batch operation to produce DDF files 
corresponding to pre-existing native files. 
2^ The DDF plug-in 14 interacts with the user by, for example, adding a 

command to the authoring application 12 which allows the user to save a DDF file 
corresponding to the file currently open in the authoring application 12, In the case of 
an authoring application which uses a graphical user interface (GUI), the added 
command may take the foim of a "Save DDF file" menu item in the authoring 
25 application's "File" menu. A user's selection of this menu item from within the 

authoring application 12 causes the DDF plug-in 14 to produce a DDF file 20 
describing the native file 18. and to optionally prompt the user for additional meta 
information (e.g,, the author's name). The user could also be provided with an option 
to have the authoring application's default ^'save file" operation always generate an 
accompanying DDF file. Alternatively, instead of using the DDF plug-in 14. the 
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authoring application 12 may be modified to allow the user to save and retrieve DDF 
files. 

As previously noted, a DDF file describes a document produced by an 
authoring application. The document described by a DDF file is referred to as the 
5 "referenced document" of the DDF file. A DDF file contains three kinds of 

information about the DDF file's referenced document: ( 1 ) meta infonnation such as 
the location and authoring ^plication of the referenced document; (2) method 
declarations enumerating the operations that can be performed on the referenced 
document; and (3) optional application-specific data describing the referenced 
10 document. 

Refeiiing to Fig. 5, the authoring application APPl 12 saves content in native 
file 18. The authoring application APPl 12 uses DDF plug-in 14 to save a DDF file 
20 describing the native file 1 8. A user of authoring application APP2 32, residing on 
client workstation 30, desires to obtain a copy of native file ] 8 transformed into 
1 5 Graphics Interchange Format (GIF). The user instructs DDF client software 34, 

residing on client workstation 30, to construct a request DDF file 42, containing a 
request to transform native file 18 into GIF format, from the information contained in 
APPl DDF file 20. DDF client software 34 transmits tiie request DDF file 42 to 
APPl DDF servlet software 26, residing on an APPl server 24. Alternatively, DDF 
client software 34 transmits the APPl DDF fiJe 20 and a separate request (not shown) 
to APPl DDF servlet software 26. 

APPl DDF servlet software 26 transforms the native APPl file 1 8 into GIF 
foimat and stores the resulting GIF file 22 on disk 16. APPl DDF servlet software 26 
constructs a response DDF file 44. which encapsulates the request DDF file 42 and 
describes the GIF file 22. and transmits the response DDF file 44 to the DDF client 
software 34. The DDF client software 34 transmits information contained within the 
response DDF file 44, such as the location of the GIF file 22, to the authoring 
application APP2 32. Authoring application APP2 32 uses standard web browser 
softvrare 36, residing on client workstation 30, to retrieve the GIF file 22. 

In one implementation, a DDF file minimally contains: (1) header fields, such 
as Hypertext Transfer Protocol (HTTP) 1.1 header fields, describing propenies of the 
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referenced document, such as its title and the date on which it was produced; (2) a 
field specifying the authoring application that produced the referenced document; and 
(3) an address or location of the referenced file, such as a Uniform Resoxirce Locator 
(URL). An example of a DDF file that references a document created by the Adobe® 
Photoshop® authoring program is shown in 
Table 8. 
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<DDF> 

<URL VALUE="http://www.coinpany.com/doc.psi"/> 
<Date VALUE="Mon Aug 4 09:48:55 PDT 1997 "/> 
<Title VALUE="Picture of House"yi> 

<Content-type VALUE="j^plication/vnd,adobe-photoshop"/> 
<Methods> 

<rransformation NAME=ThotoshapToGif ' 

RETURN-TYPE = "Image/Gif ' 

PROVIDER = "http://ddf.company.com/ptg.class''/> 
/> 

<Information NAME="EnuinLayers" 
RETURN-TYPE = "Layers" 
PROVIDER = "http;//ddf.company.coin/eI.class"/> 

/> 

</Methods> 
<AppIication'^ta> 

<AppIication-name VALUE=" Adobe Photoshop"/> 
<Application-version VALUE = "4.0"/> 
</Application-data> 
</DDF> 



Table 8 

As shown in Table 8, all DDF elements (e.g., "DDF". "Methods") are encoded 
in Extensible Markup Language (XML) syntax, and can therefore be parsed by a 
conforming XML parser even if that parser does not understand the semantics of 
application-specific data contained in the DDF file. DDF element and attribute names 
are case insensitive. 

In the DDF file shown in Table 8, the element labeled DDF indicates that the 
file is a DDF file. The VALUE attribute of the URL element specifies the location of 
the DDF file's referenced document. The VALUE attribute of the Content-type 
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element indicates the Multipurpose Internet Mail Extensions (MIME) content type of 
the referenced document. 

The optional Application-data element contains infonnation about the 
referenced document that is specific to the authoring application that produced the 
5 referenced document. For example, Application-data might mclude information about 

the individual layers of a multi-layered object produced by a graphics application, or 
information about the location of tab stops in a word processing document. Any 
information contained within the Application-data element need only be capable of 
bemg understood by components that directly manipulate the referenced document 

10 (e.g., the application APPl DDF servlet 26 shown in Figs. 4 and 5). Specifically, 

DDF client software 34 need not understand the information contained within the 
Application^data element in order to properly process and manipulate DDF files. 
Note that DDF files which do not contain an Application-data element still contain 
sufficient information to enable the retrieval of all application-specific information 

1 5 relating to a referenced document by making a request to an appropriate application- 

specific server. 

The optional Methods element declares methods that can be performed on a 
DDF file's referenced document to produce information derived from the referenced 
document. A method may be either a transfomiation or an information method, as 
20 indicated by the TRANSFORMATION and INFORMATION elements, respectively. 

Transformation methods, when applied to a source DDF, return a response DDF file 
that describes a transformation of the source DDFs referenced document. 
Alternatively, a transformation metiiod may return the actual result of applying the 
transformation method to the referenced document. For example, the transformation 
method declared by the Transformation element in the DDF file shown in Table 8 
transforms the DDF filers referenced document (an Adobe(g) Photoshop® file) into a 
Graphics Interchange Format (GIF) file, and returns the resulting GIF file. 
Information methods, when applied to a source DDF, return a response DDF file that 
contains additional infoimation about the source DDF's referenced document. For 
example, application of die Information method declared in the DDF file shown in 
Table 8 to a source DDF produces a response DDF file containing a LAYERS clement 



25 



30 



PCT/US9S/27322 

24 

which contains information about the layers in the source DDF's referenced document 
(an Adobe® Photoshop® file). 

Both TRANSFORMATION and INFORMATION elements may contain a 
NAME attribute (describing a name of the method declared by the element) and a 
5 PROVIDER attribute (providing a pointer to an implementation of the method 

declared by the element). Both Transformation and Information elements may have a 
RETURN-TYPE attribute. For Tmnsformation elements, the RETURN-TYPE 
attribute specifies the MIME type of the document returned when the method declared 
by the element is applied. For example, in the DDF file shown in Table 8, the 
1 0 RETURN-TYPE of the PhotoshopToGif transformation method is "Image/Gif," 

indicating that the result of applying the PhotoshopToGif method is a GIF image. For 
Information elements, the RETURN-TYPE attribute specifies the name of the DDF 
element whose content is returned as a result of applying the Information method. For 
example, the RETURN^TYPE of the Infomiation element in the DDF file shown in 
1 5 Table 8 is "Layers." This indicates if a request DDF file, requesting execution of the 

"EnumLayers" Information method, is transmitted to an appropriate Photoshop server, 
the resulting response DDF file will contain a Layers element containing information 
about the layers of the Photoshop document referenced by the DDF file shown in 
Table 8. Attributes common to all methods within a Methods element are optionally 
20 listed once as attributes of the enclosing Methods element. 

Message-IDs may be used to uniquely identify DDF files participating in 
client-server transactions. Message-IDs are generated in a manner analogous to 
Message-IDs used in Intemet email and Usenet articles. The Request-ID element is a 
Message-ID used by the DDF client 34 to uniquely identify a request DDF file being 
25 submitted xo a server. The response DDF file generated in response to such a request 

DDF file is guaranteed to contain this Request-ID in order to assist the DDF diem 34 
in associating the response DDF file with the request DDF file. The Response-ID 
element is a Message-ID used to uniquely identify a response DDF file. Such a 
Response-ID may be used by the DDF client 34 when submitting future requests for 
30 the same resource to the server. 

Other elements which may be included in a DDF file include, but are not 
limited to: 
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Last-modified, indicating the time that the referenced document was last 
modified. The format and meaning of this field are analogous to the HTTP/1 . 1 . 
Title, indicating a title to associate with the DDF. 
Date, indicating that time of creation of the DDF. 
5 Resource-expuies, indicating the time that the referenced document expires on 

the hosting server 

If-modified-since, used m DDF cache validation requests. The interpretation 
and function of If-modified-since is that same as in HTTP/1.1 

Note that all dates within DDF elements use the syntax defined in the 
10 HTTP/1.1 specification. 

A DDF does not need to contain all of the infomiation contained within the 
DDF's referenced document. Typically a DDF will contain only structuial and mcta 
information derived ftom the referenced document. A DDF can be thought of as a 
promise of service that manifests itself as the bits of the referenced document only 
1 5 when presented to an appropriate server with a request to produce the referenced 

document 

A TransfonnatiT^n M^th^d' 

Fig. 6 shows an approach for applying a transformation method to a DDF file. 
First, a user obtains a DDF file which will be referred to as the "source" DDF file 
20 (step 46). For purposes of this discussion, assume that the source DDF file in this 

case is the APPl DDF file 20 shown in Fig. 5. Also, for puiposes of this discussion, 
assume that the APPl DDF file 20 is the DDF file shown in Table 8, in which the 
referenced document is an Adobe Photoshop document. The user obtains the source 
DDF file in any of a number of ways, for example, by browsing an online gallery of 
Photoshop images using a DDF-enabled web browser, selecting one of the Photoshop 
images, and selecting a "Save to DDF" menu item. The user stores APP] DDF file 20 
on local disk 38 for fiiturc use. 

The user activates DDF client software 34 to generate request DDF file 42, 
encapsulating the source DDF file and a request to transform the source DDF file's 
referenced document into GIF format (step 48). In general, to apply a method within a 
DDF file to the DDF file's referenced document, a request DDF file containing an 
Expose-information element or an Apply-transformation elemem is produced. These 
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are possibly empty elements that declare calls to an applicable Transformation or 
Information method declared by the DDF file. These elements are further qualified by 
appropriate attributes taken firom the attribute list of the Transformation or 
Infonnation method being applied. The DDF itself is an implicit first argument to the 
method. Additional arguments, if present^ are encoded as the contents of the 
Arguments element of the enclosing DDF. For example, a minimal Apply- 
transformation element to convert a file into GIF format could appear in a DDF file as 
<Apply-transformation NAME^"convertToGIF"/i>, This Appty-transformation 
element would apply the method named "convertToGIF" to the DDF containing the 
Apply-transformation element. 

In the case of a request to transform the Photoshop document described by the 
DDF file of Table 8, the request takes the form of an Apply-transformation element in 
the request DDF file 42. The request DDF file 42 encapsulates, within the Source- 
DDF element, the source DDF to which the Transformation method if to be applied. 
The source DDF may be incorporated within the Source-DDF element either by value 
or by reference. Incorporation by value involves copying the entire source DDF into 
the Source-DDF element. Incorporation by reference involves placing a URL, which 
points to the source DDF, within the Source-DDF element. The contents of the 
request DDF file 42 are shown in Table 9. 



<DDF> 

<Date VALUE=-"Mon, 28 Jul 1997 20:01:12 GMT''/> 
<Creating-Application VALUE="DDF Client"/> 
<Apply-Transformation NAME^^PhotoshopToGif 
Provider = "http://ddf corapany.com/ptg,class"/> 
<Source-DDF> 

<!- DDF of Table 8 is embedded by value here -> 
</Source-DDF> 
<yDDF> 



Table 9 



35 



The client transmits the request DDF file 42 to the location of the method 
provider indicated by the PROVIDER attribute of the appropriate Apply- 
transformation or Apply-iaformation element of the request DDF file 42 (step 50). 
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For example, referring to Fig. 4, the client 30 transmits the request DDF file 42 to 
application APPl DDF servlet 26 over network 28. The APPl DDF servlet 26 is a 
provider of a method to transform Photoshop files into GIF files. The method 
provider applies the requested transformation to the source DDF file's i«ferenced 
document and produces a transformed file, which it stores locally (step 52). For 
example, referring to Fig. 4, the APPl DDF seivlet 26 transforms native file 18 from 
Photoshop format into a GIF file 22. stored on disk 1 6. 

The method provider produces a response DDF file containing information 
about the transformed file, including a URL pointing to the location at which the 
transformed file is stored (step 54). In the case of Fig. 4, for example, APPl DDF 
servlet 26 produces a response DDF file 44 (Fig. 5) containing a URL poinUng to the 
GIF file 22. The response DDF file 44 is shown in Table 10. 



<DDF> 

<Date VALUE^"Mon, 28 Jul 1997 20:06:12 GMT"/> 
<Last-Modified VALUE="Mon, 28 Jul 1997 20:05:12 GMT/> 
<URLVALUE="http://ddf.company.coni/house.gif'/> 
<TITLE VALUE="Picture of House"/> 
<Content-Type VALUE=''Image/Gif' /i> 
<Content-Length VALUE="55174"yb> 

<Creating-Application VALUE="Photoshop Server"/>- 
<Source-DDF> 

<!- DDF of Table 9 embedded by value here --> 
</Source-DDF> 
</DDF> 



Table 10 

The method provider transmits the response DDF file 44 to the client 30 (step 
56). The client 30 extracts the URL from the response DDF file 44 to request and 
obtain the transfonned file (step 58). For example, referring to Fig. 4, the client 30 
extracts the URL firom the response DDF file 44 to request and obtain the GIF file 22 
using standard web browser 36. Alternatively, in step 56, the method provider 
transmits the transformed file directly to the client 30 in order to eliminate an 
additional client-server transaction. 
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Subsequent requests to the same method provider by the same or a different 
client for a transformation of the same referenced document into GIF format may be 
satisfied by the method provider without performing step 52. because the transformed 
GIF file may already be accessible to the method provider from a previous 
transformation. The method provider may also obtain an existing GIF file from some 
other location. The method used by the method provider to obtain the transformed 
file in any particular case is transparent to the client. 

The application-data field of the response DDF can be used to cache the results 
of applying a method to a source DDF. In the example above involving transforming 
a Photoshop file into a GIF file, the resulting GIF file is stored within the response 
DDF's Application-data element. Subsequent requests by the user for a 
transfonnation of the same Photoshop file to GIF format are satisfied without 
accessing the server, because the GIF file being cached within the Application-data 
elemem of the response DDF can be extracted by DDF client software 25 and returned 
15 directly to the user. 

As shown in Table 10, a response DDF encapsulates, within the Source-DDF 
element, the source DDF to which the response DDF is a response. The source DDF 
may be incorporated within the Source-DDF element either by value or by reference. 
Embedding the source DDF within the response DDF (either by reference or by value) 
20 provides an audit trail of DDF transactions. 

An Information Method for Revealing Dop^im e nt jjlt rM^^'T' 

As explained below, the DDF method enables a client to determine the 
hierarchy of information from within the document. From this information a client 
may extract relevant content information, and may deduce from the expressed 
stnicture whether the content at a particular level is the relevant infoimation sought. 
The DDF method establishes a uniform method for revealing the structure of a 
document independently of the documem content type. The structure meta-data may 
be extracted and included in a DDF representation of the document or in a ponion of 
that representation. The DDF structural expression identifies a strucmral 
decomposition for a document, components of a document, or components of mor« 
than one document, and can accommodate docunients of various types (e.g., text, 
image, graphics, and sound). This method pro\ ides a way to communicate the 
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structure of any document content type to any client without regard to the client's 
intended use for this information. 

As explained above, document content is meaningfully processed by it 
inherent nested structure. The meaning of these nesting levels varies between content 
5 types. DDF enables structural expression applications to expose intra-document 

structure by means of a DDF primitive STRUCTURE, which is a generic formal for 
encoding hierarchical structures and their associated meta-data. Element 
STRUCTURE does not itself encapsulate the semantic interpretation of a particular 
decomposition; rather, it qualifies the decomposition by referring to one of several 
10 structure schema (discussed below). Intermediate and leaf nodes in this format are 

encapsulated in element TREE, which encapsulates only the structure of a document, 
not document content. Each node in this format reveals its position in the logical 
structure through an attribute IS-A. As explained above, hierarchical structure plays 
different roles, and a specific document instance may have one or more structure trees 
15 overlaid on it. Element STRUCTURE defines the meaning and interpretation of a 

structure hierarchy through a required SCHEMA attribute. Attribute SCHEMA 
specifies the semantics of a particular decomposition of a given document instance* 
That is, the SCHEMA attribute enables a client to meaningfully interpret the 
hierarchical structure encoded in element STRUCTURE. The semantic representation 
20 encoded in SCHEMA is generally content-type specific, but content^instance 

independent. The SCHEMA is therefore specific to the decomposition, not 
necessarily to the document instance. Attribute SCHEMA also serves to qualify the 
namespace for resolving the values of attribute IS-A on structure elements (e.g., 
whether a tree node corresponds to a chapter or a layer). The role of the SCHEMA 
25 attribute is to represent the fact that in one kind of decomposition, for example, 

chapters occur below parts and sections occur below children. Structural expression 
applications expose intra-document structure by invoking a DDF information method 
"getStructurelnterpretation," which enables the applicaUon to express the smicmral 
meaning associated with a particular tree node. That is, a DDF server, knowing the 
30 structures of the hosted documents, either by having a resource that defines the 

strucmres of the hosted documents or extracts strucmres from the hosted documents, 
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may incrementally reveal more and more of the structure depending upon the client 
initiated query, 

The structure of FrameMaker<S) document 100 (Fig. lA) may be represented as 
a DDF file, as shown in Table 1 1 . Table 12 contains a structure whose decomposition 
was controlled by the namespace for sheet music (Table 3, above). This structure is 
elaborated down to the level of STAFF; further client requests would reveal 
MEASURE and NOTE nodes. 
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<STRUCTURE schema="fin-report" i5-a="Model.D0CUMENrr'> 
<TREE is-a^-ARTICLE'^ 
<rREE is-a=''HEADING" childHX)unt="0"/> 
<rREEis-a="SECTION"> 
<TREE is-a="HEADING" child.<5ount="07> 
<TREE iS'a="PARA" child-count=-"0"/> 
<TREE is'a=TARA" child-count="0"/> 
^TREE is-a=''SECTION"> 
<TREE is-a="HEADING" child-count="0"/> 
<TREE is-a="PARA" child-count="0"/> 
<TREE is-a='TICURE" cUld.count="0"/> 
<TREE is-a="LAYER" child-couirt="0"/> 
<ROLE is.a='*raster"/> 
<n"REE> 
<TKEB> 
</STRUCTURE> 



Table 11 



DDF representation of the structure of Adobe® FrameMakei® 
document 100 of Fig. lA. 



<STRUCTURB schema="sheet music" is-a-"SCORE" chiW-count="l"> 
<rREEnode-id="a'' is-a="PART" child'Count="2"/> 
<TREE node-id="r' is-a-"STAFF" child-count="180"/i> 
<^rREE node-id="2" is-a-"STAFF" child.count=''180"/> 
</TREB> 
</STRUCTURE> 



Table 1 2 DDF representation of the structure of sheet music. 

The structures of an Adobe(S> Illustrator® document and a PDF document may 
be represented as DDF files as shown in Tables 1 2 and 13, respectively. The Adobe® 
Illustrator® document has four layers, vector graphics, two embedded images, and a 
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text box with editable text. The Adobe® PDF document is only partially exposed, as 
indicated by the non-zero CHILD-COUNT values; the pages of this document contain 
extractable text and raster images, but this structure is not revealed by the level of 
detail presented in the structural expression in Table 13. 
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<STRUCTURE scheraa="ai-file" is.a="Model.PORTFOLIO"> 
<rREE is-a="LayeredGfxCModeI.LAYER"> 
<TREE is-a-"CModel.UNir'> 
<TREE is-a^-VectorGficCModeLLINE" child-counl?="07> 
<rREE is-a="VectotGfxCModelLINE" child-count="07> 
<rREE is.a="VcctorG6cCModel.LINE" child-count="0"/> 
<TREE is-8="VectorGficCModel.LINE'' child-count="0"/i> 
<TREE is-a="TxtCModel.TEXT" child.count=''0"/^> 
</TREE> 
</rREE> 

<TRBE is-a="LaycredGfxCModel.LAYER"> 
<TREE is-a="VectorGficCModel.CURVE" child-count="0"/t. 
<TREE is-a^^VectorGficCModel.CURVE" chiId-count="0"/> 
<TREE is-a="VectorG6cCModel.CURVE" chiId-<!ount="0"/> 

<TREE is-a="VectorG6cCModel.CUR\'E" child-count="0"/> 
</TREE> 

<TREE is-a="LayeredGficCModel.LAYER"> 

<TREE is-a=''RasterGficCModel.RASTER" child-count="0"/i> 
<n'REE> 

<TREE is-a="LayeredGficCModel.LAYER"> 
<TREE is-a="RasterGfi£CModei.RASTER'' child^ount="0"/i> 
</TREE> 
</STRUCTUR£> 



Table 13 DDF representation of the su^icture of an Adobe® lUustrator® 
document. 
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<STRUCTURE schema="pdf-doc" is'a="Model.DOCUMENT"> 
«a-REE is-a^-DocOfxCModelPAGE" chUd-count="3"/> 
<rREE is-a="DocGfxCModel.PAGE" child-count="3"/> 
<TREE is-a="DocGfxCModel.PAGE" child.count="5"/> 
<T?£E is-a=''DocGficCModel.PAGE" child-count="l"/> 
<rREE i8-a="DocGfiiCModel.PAGE" child-count=''3"/> 
</STRUCTURE> 

Table 14 DDF representation of the structure of an Adobe® PDF 
document. 

Thus, the hierarchical structure of a document may be revealed on a request-to-know 
basis. For example, the DDF instance for an encyclopedia may contain an element 
STRUCTURE that encodes the fact that the encyclopedia has twenty-six children. 
The associated SCHEMA captures the fact that the immediate children of ian 
"Encyclopedia" are "Volumes." A client receiving such a DDF instance and wishing 
to learn more information about a particular child (i.e., volume) would submit a 
request DDF to a DDF-enabled application server for a DDF file that revealed more 
structure about that particular child. The DDF application server would examine the 
applications files making up the encyclopedia, discover that the requested child has, 
e.g„ sixteen chapters, and record that information within element STRUCTURE in a 
reply DDF that would be returned to the client. 

In a client-server implementation, the DDF method enables a client to receive 
basic structure information fiom a server-hosted document The structure information 
may be mapped or reflect actual content of the queried document, independently of 
the document content type. Given a fiist-Ievel revealing of the structure (and 
coiresponding content), a decision may be made by the client whether a second-level 
revealing is sought. The server may then determine the various levels of a document's 
structure (and hence content) that can be revealed to die client. Thus, given a DDF 
server that contains structure information of a resident document, a query may be 
made of the DDF server to reveal information from the sought document. Given that 
a document's structure correlates to the content within it, by making a query on a 
strucwre-Jevel basis, corresponding content may be incrementally revealed. 

Structural expression applications reveal as much (or as little) of the tree 
structure as is appropriate for a particular document instance. Structural expression 
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applications that expose the hierarchical structure of a document also support a DDF 
Information Method "rcvealStructure," which takes as arguments a position in the tree 
and a depth specification. The result of applying this transfonnaiion is a new DDF 
that exposes the tree structure at the level of detail requested by a client Required 
5 attribute CHILD-COUNT on elements at the leaves of the exposed stnicture may be 

used by cUents when deciding whether to ask for additional structural infonnaUon 
about a document instance. In one implementation, a non-zero value of CHILD- 
COUNT indicates that additional structure may be revealed than is currentiy 
expressed. 

1 0 Element STRUCTURE encodes a specific structural decomposition of a 

document instance. Where multiple structural decompositions exist, each structural 
decomposition may be expressed by separate STRUCTURE elements. Documents 
often decompose into multiple synchronous tree sttuctures (e.g., a picture may be 
decomposed into three layers, where each layer may also play the role of a raster). In 

1 5 one implementation, an empty element ROLE may be used to declare the various 

other roles played by intermediate and leaf nodes in a structure tree. 

Document management applications usmg DDF often need to encode 
properties - and more generally property hierarchies - that are specific to a given 
application domain, DDF enables structural expression applications to encapsulate 

20 such properties through a PROPERTIES element. This element may hold a property 

hierarchy and specify a schema for this hierarchy for use in a validating processor. 
There may be one PROPERTY element per unique properties schema. The 
PROPERTIES element may appear within the DDF TREE element, enabling 
structural expression applications to store properties about specific structural elements 

25 in the DDF, as shown in Tables 14 and 15, for example. Table 14 contains a DDF 

representation of the stmcture of a book containing two chapters with the structure 
exposed up to the chapter level. Non-zero values of attribute CHILD-COUNT 
indicate that there is more structure that can be revealed. Table 1 5 contains a DDF 
representation of the structure of an image containing two layers. Optional element 

30 ROLE indicates other semantic roles played by a specific structural unit. 
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<STRUCTURE schema="text-logical" is-a="book" chi!d-count="2"> 

<PROPER'nES> ^„^„v.T.^^ 
<PROPERTY name="litle'*>lntroduction</PROPERl Y> 

</PROPERTlES> 

<TREE is-a="chapter" child-count="2"> 
^<PROPER^^nain6=''title''>Welcome To Logical Structure</PROPERTY> 

<flPROPERTIES> 
</rREE> 

<TREE is-a="chapter" child-count=''3"> 

'^<PROpST?name="title'>Concl^^^^ To Logical Structure</PROPERTY> 
</PROPERTIES> 
<TREE> 
</STRUCTURE> 



Table 15 



DDF representation of the structure of a book containing two 
chapters with the structure exposed up to the chapter level. 



<STRUCTURE schema-''iinage-stiucture" is-a="graphic" child-count="2"> 

<PR0PERT1ES> ^..^.rj. 
<PROPERTY naine=='*tiUe'>Picture Of Aster In Front Of Adobe Towers 

</PROPBRTY> 
</PROPERTlES> 
•CTREE is-iF="layer"> 

<ROLE is-a^^raster''^ 

<PROPERTIES> 

<PROPERTY name="title'^Picture Of Aster Labrador</PROPERTY> 
<PROPERTYname="label'^foreground</PROPERTY> 

</PROPERTIES> 

<nrREE> 

<TREE is-a="layer"> 
<RbLEis-a-"rastet"/> 

<PROPERTIES> ^«r^n'r^/^ 

<PROPERTY name="title">Picture of Adobe Towers</PROPERTY> 
<PROPERTVname="label">background</PR0PERTY> 

<yPROPERTIES> 
</TKEB> 
</STRUCTURE> 



Table 1 6 DDF representation of the structure of an image containing two 
layers. 
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ntVier Methodst 

Other operaUons may also be perfonned on DDFs. Consider a user who 
queries an database of graphical images. Tlie user picks three images out of a set of 
the ten images that result from the query. Infonnation about these three images is 

5 assembled to construct a single composite DDF file that is either saved on the user's 

local disk, printed, mailed to another user, or posted on the World Wide Web. Use of 
a composite DDF file in the place of the original three documents enables the user to 
move the composite DDF file around as a single file which is much smaller than the 
combination of the original three documents. The selected image data will oniy be 

1 0 retrieved from the image database when necessary, e.g., by a printer when the user 

requests to print the data, or by an email recipient reading the mail message. 
Furthermore, the user query could return a set of DDFs instead of the actual images, 
with the image data being retrieved fi;om the image database at a subsequent time if 
the user so chooses. 

1 5 The single composite DDF file described above may be implemented by using 

an AGGREGATION element An example of such a DDF is shown in Table 1 1 . An 
AGGREGATION element worics like a virtual paper clip for putting together a sheaf 
of DDFs. In other words, the AGGREGATION elemem is the DDF conjunction 
operator. The component DDFs making up the aggiegaUon may be embedded either 

20 by value or by reference. The meaning of the AGGREGATION element is that when 

the DDF client 25 processes the aggregation, all of the aggregation's components will 
be processed. 
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<DDF> 

<Date VALUE="Mon Aug 4 09:48:55 PDT 1997"/> 
<ritle VALUE="Photo Album"/> 
<Content.TypeVALUE="Application/ddf-aggrcgation"/> 

<Aggregation> 
<DDF REF="photo-l.ddf"/> 
<DDF REF="photo-2.ddf '/> 
<DDF REF="photo-3.ddf '/> 
</Aggregation> 
</DDF> 

Table 17 
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Similarly, the ALTERNATION element can be used within a composite DDF 
to provide the functionality of a logical disjunction. The ALTERNATION element is 
used to allow the DDF client 25 to pick one out of a collection of DDFs. The 
ALTERNATION element itself does not specify which element to pick; the DDF 
client 25 might choose one of the alternatives in an alternation based on constraints 
provided by the environment. For example, consider an image that is to be delivered 
to the DDF client 25. The image may have several representations, e.g., a low- 
resolution representation for quick screen previews, a medium-resolution 
representation for printing on an Inkjet printer, and a high-resolution representation 
for sending to a high..end imagesetter. A DDF ALTERNATION element may be used 
to encapsulate each of the representations within a single DDF element. When the 
DDF file containing the alternation is delivered to the DDF client 25. the DDF client 
25 chooses which one of the DDF's encapsulated within the alternation to retrieve, 
based on the current user environment. 

The component DDFs of an alternation may be embedded either by value or by 
reference. The meaning of the ALTERNATION element is that when the DDF client 
25 processes the alternation, one and only one of the alternation's components will be 
consumed. A DDF file containing an ALTERNATION element is shown in Table 1 2. 
Ellipses indicate portions of the DDF omitted for clarity. 



<DDF> 
<Altemation> 

<!- Alternative 1 : embed by value -> 
<DDF> 

<!-- text of DDF file goes here ••> 
</DTi?> 

<!- Alternative 2: embed by reference ~> 
<DDFref="ht^-7/ddf.company.com/document.ddfV> 

</Altemation> 
<yDDF> 



35 



Table 18 
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Another useful operation that can be performed using DDFs is document 
subsetting. Consider the DDF shoxvn in Table 13. Tlie Application-data element of 
the DDF contains Start-Page. End-Page, and Number-of-Pages elements, containing 
information about the starting page, ending page, and number of pages of the 
referenced document, respectively. A user who wishes to prim pages 10 through 20 of 
the referenced document could use the DDF client 25 to produce a secondary DDF file 
which encapsulates the original DDF file within the Source-DDF element, having a 
Start-page of 10 and an End-Page of 20. An example of such a secondary DDF is 
shown in Table 14. This secondary DDF could be passed to an appropriate 
appUcation-specific DDF-aware server to obtain a Portable Document Format (PDF) 
document that has the document data for pages 1 0 through 20. Similarly, such a 
secondary DDF could be passed to a DDF-aware printer 42 lhat downloads only the 
minimal amount of document data needed to render the desired pages. In other words, 
DDF-aware applications can use such a secondary DDF file as a substitute for the 
referenced document until the document data is actually needed. 
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<DDF> 

<Date VALUE="Fri, 27 Jun 1997 20:06:12 GMT"/> 
<URLVAHJE="httpt//www.company.coni/thesis.pdf'/> 

<NAME VALUE="PhD Thesis''/> 

<Content-Typc VALUE="Application/PDF7> 

<Content-Length VALUE="1 1052327> 

<Last-Modified VALUE=''Fri. 05 Aug 1994 01:17:21 GMT"/> 

<Creating-Application> V ALUE="AdQbc Framemaker"/> 

<Application-Data> 

<Application-Version VALUE="5.5"/> 

<Source-Data> 
<!-- Application-specific data goes here -> 

</Source-Data> 

<Number-Of-Pages VALUE'="143"^ 
<Start-PageVALUE="17> 
<End-PageVALUE="143"> 
<Number-Of-Chapters VALUE'"6"/> 
</Application-Data> 

</DDF> _™ 



Table 19 
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""Site VALUE="Fri, 27 Jun 1997 20:06:12 GMr'/> 
<URL VALUE- 'http://www.company.com/thesis.pdf /> 

<rnTLE VALUE="PbD Thesis"^ 
<Coment-Type VALUEF^Application/PDF''/> 
<Content.LengthVALUB=-"110S232"/> 
<Last-Modified VALUE="Fri. 05 Aug 1994 01:17:21 GMT"/> 
<Creating-Application VALUE-" Adobe Framenuikcr"^ 
<Application-Data> 
<Application-Version VALUE==" 5.5"/> 

<Sourcc-Data> 
<!- Application-specific data goes here ~> 

</Source-Data> 

<Number-Of-Pages VALUE="143"/> 

<Start-Page VALUE=" 1 0"/> 

<End-PageVALUE="207> 

<Number-Of-Chapters VALUE="6"/> 
</AppUcatiorv-Data> 
<Source-DDF> 

<!- DDF of Table 13 embedded by value here -> 
</Source-DDF> 
</DDF> 
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Table 20 

Referring to Fig. 7, the document description format plug-in 24 is 
impleniemed in digital electronic circuitry or in computer hardware, firmware, 
software, or in combinations of them. Apparatus of the invention may be 
implemented in a computer program product tangibly embodied in a machine-readable 
storage device for execution by a computer processor; and method steps of the 
invention may be performed by a computer processor executing a program to perform 
functions of the invention by operating on input data and generating output. 

Suitable processors 1080 include, by way of example, both general and special 
purpose microprocessors. Generally, a processor will receive instructions and data 
from a read-only memory (ROM) 1 120 and/or a random access memory (RAM) 1110 
through a CPU bus 300. A computer can generally also receive programs and data 
from a storage medium such as an internal disk 1030 operating through a mass storage 
interface 1040 or a removable disk 1010 operating through an 1/0 interface 1020. The 
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flow of data over aal/0 bus 1050 to and from I/O devices 1010. 1030. 1060. 1070 
and the processor 1080 and memory 1110, 1120 is controlled by an UO controller. 
User input is obtained through a keyboard 1070. mouse, stylus, microphone, trackball, 
touch-sensitive screen, or other input device. These elements will be found in a 
3 conventional desktop or workstation computer as well as other computers suitable for 

executing computer programs implementing the methods described here, which may 
be used in conjunction with any digital print engine 1075 or marking engine, display 
monitor 1060, or other raster output device capable of producing color or gray scale 
pixels on paper, film, display screen, or other output medium. 
1 0 Storage devices suitable for tangibly embodying computer program 

instructions include all forms of non-volatile memory, including by way of example 
semiconductor memoiy devices, such as EPROM. EEPROM, and flash memory 
devices; magnetic disks such as internal hard disks 1030 and removable disks 1010; 
magneto-optical disks; and CD-ROM disks. Any of the foregoing may be 
15 supplemented by. or incorporated in, specially-designed ASICs (application-specific 

integrated circuits). 

Although dements of the invention are described in terms of a software 
implementation, the invention may be implemented in software or hardware or 
firmware, or a combination of the three. 
20 The present invention has been described in terms of an embodiment. The 

invention, however, is not limited to the embodiment depicted and described. Rather, 
the scope of the invention is defined by the claims. 
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WHAT IS CLAIMED IS: 

1 . A method of describing the hierarchical structure of a document having 
content of a characteristic type of content, comprising: 

expressing, independently of the document content type, the hierarchical structure 
of the document as a tree structure of one or more nodes; and 

providing a semantic representation for interpreting the tree structure. 

2. The method of claim 1 fiirther comprising associating with each tree node an 
attribute describing the semantic character of the associated tree node. 

3. The method of claim 1 wherein the semantic representation is provided based 
upon the document content type. 

15 4. nie method of claim 1 wherein the semantic representation is provided 

independently of the document content. 

5. The method of claim 1 wherein the semantic representation defmes parent- 
child relationships among the nodes. 

20 -u A 

6. The method of claim 5 wherein expressing comprises associatmg witii a node 

a child-count attribute indicative of whether the node has associated child nodes that 
have not yet been expressed in the tree structure. 

25 7. The method of claim 1 ftirther comprising: 

expressing, independentiy of the document content type, the hierarchical structure 
of the document as a second tree structure of one or more nodes; and 

mterpreting the second tree structure in accordance with a second semantic 
representation which is different from the first semantic representation. 



30 



8. The method of claim 1 wherein expressing comprises associating with a given 
node an attribute identifying a second semantic interpretation for the structural feature 
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of the document represented by the given node, the second semantic interpretation 
being different ftom the first semantic interpretation. 

9. The method of claim 1 further comprising recording the hierarchical structure 
of ttie document on a computer-readable medium. 

10 A method of extracting content ftom a document having content of a 
characteristic type of content, comprising: 

providing access to document content in response to a request for document 
content based upon an expression of the hierarchical structure of the document that is 
independent of document content type and has an interpretation controlled by a 
semantic representation. 

11. The method of claim 10 further comprising providing the requested document 
content. 

1 2 The method of claim 1 0 fiirther comprising providing a pointer to the 
requested document content. 

1 3. The method of claim 1 0 wherein the access to document content is provided in 
response to a client request 

1 4. A method of describing the hierarchical structure of a document having 
content of a characteristic type of content, comprising: 

in response to a client request for information relating to the hierarchical stnicmre 
of the document, expressing, independently of the document content type, the 
hierarchical structure of the document as a tree structure of one or more nodes; and 

providing a semantic representation for interpreting the tree structure. 



15. The method of claim 14 wherein the client request comprises a request for 
information relating to the position of one or more nodes within the tree structure. 
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16 The method of claim U >vherein expressing comprises expressing the 
.hierarchical structure of the document at a level of detail specified in the client 
request. 

5 17 THe method of claim 14 wherein expressing eomprises associating with a 

given node an attribute indicating the relative detail level represented by the given 
node. 

1 8 The method of claim 1 7 fiirther comprising: 

10 in response to a client request for structural information about the given node at a 

level of detail that is different from the level of detail indicated by the anribute 
associated with the given node, expressing, independently of document content type, 
the hierarchical structure of the document as a tree stnjcture of one or more nodes, 
including the given node, at the detail level specified in the client request. 

15 • • • u 

19 The method of claim 14 wherein expressing comprises associatmg with a 

given node an attribute identifying a second semantic representation for the structural 
feature of the document represented by the given node, the second semantic 
interpretation being different from the first semantic interpretation. 

20. The method of claim 19 further comprising, in response to a client request, 
providing the second semantic itspresentaUon for interpreting the given node. 

21 . The method of claim 20 further comprising, in response to a client request, 
25 providing access to document content based on the second semantic representation. 

22. A document description file, stored on a computer-readable medium, for 
describing the hierarchical structure of a document having content of a characteristic 
type of content, comprising: 

30 atree structure of one or more nodes expressing, independently of the document 

content type, the hierarchical structure of the document; and 
a semantic representation for interpreting the tree structure. 
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23 The document description file of claim 22 further comprising an attribute 
associated with each tree node describingthe semantic character ofthe associated t^^ 

node. 

5 24. Thedocumentdescriptionfileofclaim22whereinthesemanticrepres^^^^^^^ 

is based upon the document content type. 

25. The document description file of claim 22 wherein the semantic representation 
is independent of the document content. 

26. The document description file of claim 22 wherein the semantic representation 
defines parent-child relationships among the nodes. 

27 The document description file of claim 26 further comprising a child-count 
15 attribute associated with a node that is indicative of whether the node has associated 

child nodes that are not yet expressed in the tree structure. 

28. The document description file of claim 22 fiirther comprising: 
a second document description file comprising 
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25 



30 



a second tree structure of one or more nodes expressing, independently of the 
document content type, the hierarchical structure of the document, and 

a second semantic representation which is different fi:om the first semantic 
representation for interpreting the second tree structure. 

29. The document description file of claim 22 ftinher comprising an attribute 
associated with a given node identifying a second semantic interpretation for the 
structural feature of the document represented by the given node, the second semantic 
interpretation being different from the first semantic interpretation. 

30. A document description file, stored on a computer-readable medium, for 
describing the hierarchical structure of a document having content of a characteristic 
type of content, comprising: 
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a tree structure of one or more r.odes expressing, independently of the document 
content type, the hierarchical structure of the document; 

a semantic representation for interpreting the tree structure; and 
information relating to document content within the hierarchical .tructute 
expressed by one or more tree nodes produced in response to a client request for 
document content associated with one or more tree nodes. 

3 1 The document description file of claim 30 wherein the information relating to 
document content comprises a pointer to the requested document content. 

32. ThedocumentdescriptionfileofclaimSOwhereintheinformationrebtingto 
document content comprises the requested document content. 

33 A document description me, stored on a computer-readable medium, for 
describing the hierarchical structure of adocument having contentofacharactensttc 

type of content, comprising: 

a tree structure of one or more nodes expressing, independently of the document 
content type, the hierarchical structure of the document, the uee structure bemg 
produced in response to a client request for infomiation relating to the hierarchical 

20 structure of the document; and 

a semanUc representation for interpreting the tree stmtture. 

34 The document description file of claim 33 wherein the t«:e structure expresses 
the hierarchical structure of the document at a level of detail specified in the client 

25 request, 

35. The document description file of claim 33 further comprising an attribute 
associated with a given node indicating the relative detail level represented by the 
given node. 
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36 m document description file of claim 35 further comprising: 
atreestructur.ofonecrmcrenodes,includingthcgivennode,expressmg, 

indepe.denUyofdoc«ment content type, thehierarcHcalstn^ctureofthedo^^^^ 

adetaillevel specified inacUentreque^ for— I infonnationabout^^^ 

:odrthejsteddetaineve.beingdiffe.entf^.thelevelofdet^^ 

attribute associated with the given node. 

37 -n.e document description file of claim 33 further comprising an attribute 
associated wi^agivennodeiden^fyingasecond semantic interpretationfor*^ 

stn^ctural feature of the document represented by the given node.the second sema^^^^ 
interpretation being different from the first semantic interpretation. 



38 A method executed on a computer, comprising; 

generatingafirstdocumentdescriptionfilefordescribingadocumentstoredona 

1 5 computer-readable medium, comprising: 

generating a description of an application that produced the document; 
generating a description of a location from which the document can be 

obtained; and 

generating a description of an operation tiiat can be performed on ti.e 

20 document. 

39. The metiiod of claim 38. wherein the description of the location comprises a 
uniform resource locator* 

25 40, The metiiod of claim 39, wherein the uniform resource locator identifies a 

seiver configured to produce tiie document upon request. 

4 1 . The method of claim 39, wherein the uniform resource locator identifies a 
location at which the document is stored. 
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42. The method of claim 38, wherein: 
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the operation compri^esatransfonnationofthe document fromam 

first storage forrt«tt to a file stored in a second storage format; and 

the operationproducesaseccnd document description file that describes the file 

stored in the second storage format. 

43. m method of claim 42. further wherein the second document description file 
describes the first document description file. 

44 The method of claim 38, wherein: 

the operation comprises extraction of information from the document; and 
ti.e operation produces a second document description file that describes the 
information ejctracted from the document. 

45. The method of claim 44. fimher wherein the second document description file 
describes the first document description file. 

46. The method of claim 44. wherein the infom^ation extracted from the documem 
describes a range of pages of the document. 

20 47 Hie method of claim 44, wherein the document represents a multi-layered 

graphical object, and the information extracted from the document describes a subset 
of the layers of the multi-layered graphical object. 

48. The method of claim 38. further comprising generating application-specific 
25 data describing the document. 

49. nie method of claim 48, wherein application-specific data comprises a name 
of an application that produced the document. 
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30 



50. The method of claim 48, wherein application-specific data comprises a version 
number of an application that produced the document. 
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51 The method of claim 38. further comprising: 
generatitxg a field containing information describing the document. 

52. ThemethodofclaimSLwhereinthefieldisanHTTPheader. 

53. ThemethodofclaimSl.whereinthefielddescribesadateonwhichthe 
document was produced, 

54. The method of claim 51. wherein the field describes a date on which the 

1 0 document was modified. 

55. Ue method of claim 51. wherein the field describes a size of the document. 

56. The method of claim 51. wherein the field describes coment contained in the 
15 docximent. 

57. Themethodofclaim38.whereinthecontentofthefirstdocumentdescription 
file is represented in XML syntax. 

20 58 A method for processing a it^quest for infomiation derived from a first 

document, the first document being stored on a computer^ieadable medium, the first 
document being described by a first document description file stored on a computer- 
readable medium: 

receiving the request and the first document description file from a client; 
25 retrieving the information derived ftom the first document; 

generating a second document description file describing the information derived 
from the first document. 



30 



59. The method of claim 58, wherein retrieving the information derived from the 
first document comprises retrieving a second document, stored on a computer 
readable medium, containing the information derived from the first document. 
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60 H^e method of claixn 58. wherein retrieving the information derived from the 
first document comprises perfoniung the operation on the first docu^^^^ 

the information derived from the first document. 

61 . The method of claim 58. wherein the information derived from the first 
document comprises a second document. 

62. THe method of claim 61, wherein the second document description file 
comprises a pointer to the second document. 

. The method of claim 62. wherein the pointer comprises a uniform resource 



63 
locator. 

64. The method of claim 61. further comprising transmitting the second document 
15 to the client. 

65. The method of claim 58, wherein the infomiation derived fi:om the first 
document comprises the fu-st document. 

20 66. The method of claim 58, wherein the imomialion derived from the first 

document comprises a pointer to the first document 

67. The method of claim 66, wherein the pointer comprises a uniform resource 
locator. 

25 

68. The method ofclaim 58, fiirther comprising: 
transmitting the second document description file to the client. 

69. The method of claim 58, wherein the first document description file contains 
30 the request. 
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70 A document description file, stored on a computer-readable medium, for 
describing a document stored on a computer-readBble medium, the document 
description file comprising: 

a description of an application program that produced the document; 
a description of a location from which the document can be obtained; and 
a description of an operation that can be performed on the document. 

71 The document description file of claim 70, wherein: 

the operation that can be performed on the document comprises a transformation of 
the documem from a file stored in a first storage format to a fde stored in a second 
storage format; and 

the operation produces a second document description file that describes the file 
stored in the second storage format. 

72. The document description file of claim 70, wherem: 
the operation that can be performed on the document comprises extraction of 
information from the document; and 

the operation produces a second document description file that describes the 
infonnation extracted from the document. 

73. The documem description file of claim 70, further comprising a description of 
an operation to be performed on the document. 

74. A method executed on a computer for retrieving information derived from a 
25 first document, the first document being stored on a computer-readable medium, the 

method comprising: 

retrieving by a first client a first documem description file, stored on a computer- 
readable medium, in response to a request from a second client, the first docmnent 
description file describing an application that produced the first document, a location 
from which the first document can be obtained, and an operation that can be 
performed on the first document; 
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J •^♦u^ r^\f^ tn retrieve the iaformation derived from 
using the first document descnptton file to retrieve ux^ 

the first document; and 

^i^,*.i„f.nn,doadcnvedfton.:l>efl:s,docu»en.,o me second c„e«. 

,5 Tl.„efl>odofeWm74.v*««-:siag*efWdocumemdescrip<ionfUe.o 

Wonn.Uo„a.rivedftom*efc.d«umen.co«pn,es»«,ev.„g.«cona 

from the first document, 

76 Then,ethcdofcWm74.»herei„-n8*efi«d— de»ip«o„me,o 

„rteve U.. WbnoaUo. derived torn U>e to. deepen, eomprises perfomung *e 
operation ontetodo— to produce U,eWon.atadenvedfh,mU>e to. 

document. 

77. The method of claim 74. wherein the iiifonnation derived from the first 
document comprises a second document. 

78. The method of claim 74, wherein the information derived from the first 
document comprises the first document. 

79. A system comprising: 

a first computcr-readable medium having a fir^ document produced by a first 

application; . . 

a second computer-readable medium having a first document descnption file 
describing U^cfirst appUcaUon.alocation on thefirstcomputer-readahle medium from 

which the first document canbeobtained.andadescriptionofan operation that can 

be performed on the first document; and 

a server configured to produce information derived from U;e first document. 

80. Hie system of claim 79, wherein the information derived from the first 
document comprises a second document. 
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81 The system of claim 80, wherein the information derived from the first 
documentcomprisesasecond document descriptionfiledescribing the second 

document. 

82. •n.e system of claim 81, >vherein the second document description file finther 
describes the first document description file. 

83. The system of claim 79. wherein the first and second compnter-readablc media 
are the same computer-readable media. 



84. The system of claim 79, wherein the fit.t and second computer-readable media 
are diHerent computer-readable media. 

85 A method executed on a Mmpuler.compiiailg: 

8e„c»ting a composite docmtent descripticn 61e for describing a combtauon of a 
fct docnntent description ffl. and , second document descripUon file, the method 

comprising: 

generating a description of the combination; 
genemting a description of the first document description file; and 
generating a description of the second document description file. 

86. The method of claim 85. wherein the combination is an aggregation of the first 
document description file and the second document description file. 

25 87. The method of claim 85. wherein the combination is an altetT^ation of the first 

document description file and the second document description file. 

88. The method of claim 85, further comprising: 
generating a description of an operation that can be performed on the first 
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document description file and the second document description file. 
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89 A composite document description fUe, stored on a computer-readable 
„,edium. for describingacombinationofafir^ document description me anda 

second document description file, the composite document descnption file 

comprising: 
5 a description of the combination; 

a description of the first document description file; and 
a description of the second document description file. 

90 TT,edocumentdescriptionfileofclaim89,whcreinthecombinationisan 

,0 aggregationofthefirstdocumentdescripUonmeandthescconddocumentdescripUon 

file. 

91 m document description file of claim 89. .vherein the combination is an 
alternation ofthe first document description file and the second document descn^^^ 

15 fiie. 

92 The document description file of claim 89. further comprising a description of 
an operation that can be perfonned on the first document description file and the 
second document description file. 
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